The JH4 intron within the immunoglobulin locus is subject to somatic hypermutation and therefore varies in sequence compared with the germline / reference.
This locus has been amplified as a 433 bp amplicon and sequenced with 2x250 bp PE reads
The goal is to align the reads with the reference sequence, quantify the number of mutations per amplicon, plot a distribution of mutational load within each sample, then compare the distribution between samples.
In analysis version 1, we took two approaches to characterise the mutations within the reads:
For the CrispRVariants approach, the package didn’t seem to work particularly well for the type of data here. The analysis presumed that the reads would entirely span the region under investigation, and referenced mutations to a guide sequence / location. The notation of mutations was also pretty difficult to understand.
For the Mutect2 approach - this gave the number of variants called per sample, but it did not give the number of variants called per read.
In version 2, we took a back to basics approach to reevaluate the mutations in the data. We stitched the reads into consensus reads, collapsed them into sets ising fastx_collapser, aligned them to the reference with mafft and then compared each position with the reference, keeping track of differences.
The following requests were made to refine the version 2 analysis:
library(readr)
library(ggplot2)
library(tidyverse)
library(data.table)
This is the same as in the first version of the analysis. We’ll stitch together the forward and reverse reads into a single sequence for the amplicon. This step involves using the commandline program bbtool. In the bbtools package, there is a function called bbmerge that overlaps reads into consensus reads.
#!/usr/bin/bash
module purge
module load Miniconda3/4.7.10
conda activate bbtools
bbmerge.sh in=$read1.fastq.gz in2=$read2.fastq.gz out= merged_reads/$sample.fastq.gz
This produced a folder containing the consesnus reads for each sample
merged_reads/*.fastq
In this work, we’re interested in how many mutations are present per read. To simplify the input, we can collapse the read sequences into identical sets and count how many individuals are in each set. We can use a program called fastx_collapser for that
#!/usr/bin/bash
module purge
module load fastx_toolkit (Get the proper modue name for this)
fastx_collapser -v -Q 33 -i merged_reads/input.fastq -o collapsed/collapsed_output.fasta
This produces a folder with the collapsed read sequences. In the fasta header for each sequence, we’re given the rank order of the sequence (in terms of abundance) and the number of reads that share that sequence
collapsed/*.fasta
Following this, we can use MAFFT to run a multiple sequence alignment on the fasta files for each sample. We can use the -add argument of MAFFT to align the reads against the reference
#!/usr/bin/bash
module purge
module load MAFFT (Get the proper modue name for this)
mafft --auto --add collapsed/input.fasta reference.fasta > mafft/aligned.aln
Here, we’ve produced a set of multifasta format alignments (one for each sample):
mafft/*.aln
It’s actually pretty difficult to try to check if the reads are truncated or foreshortened, rather than just detecting these as deletions. First, we’ll just try to remove all the reads that have any indels at all, and see how many reads we have then
working.dir <- '/Volumes/babs/working/goldstr2/projects/caladod/anqi.xu/DN19023-ax343/'
setwd(working.dir)
dd <- read.csv('mutation_counts.tsv', sep="\t")
dd <- data.table(dd)
knitr::kable(dd[1,])
| sample | rank | count | snps | deletions | insertions | total | length | sequence | conversions |
|---|---|---|---|---|---|---|---|---|---|
| DCKP93B_BM_5DPOG | 1 | 343 | 0 | 0 | 0 | 0 | 433 | -ccccactccactc-tttgtccctatgcatcggatactgtataaatgctgtcacagaggtggtcctgaagtatattccacaactaatacttttattctaaaaactgaaaatctccaactacagccccaactatccctccagccataggattgttttagcatcttcttcaaatgagcctccaaagtccctatcccatcatccagggactccaccaacaccatcacacagattcttagtttttcaaagatgtggagataatctgtcctaaaggctctgagatccctagacagtttatttcccaacttctctcagccggctccctcagggacaaatatccaagattagtctgcaatgctcagaaaactccataacaaaggttaaaaataaagacctggagagg—ccattcttacctgaggagacggtgactgaggttcc—– | A:C=0;A:G=0;A:T=0;C:A=0;C:G=0;C:T=0;G:A=0;G:C=0;G:T=0;T:A=0;T:C=0;T:G=0 |
dd.orig <- dd
## first, let's just remove anything that has a indel at all
dd <- dd[which( (dd$deletions + dd$insertions) == 0),]
dim(dd)[1] / dim(dd.orig)[1] * 100
## [1] 85.69623
By removing reads with any indel, we’re left with 86 % of reads - I think that’s probably ok
Within each alignment, we have the aligned version of the reference. We can walk over each of the collapsed read sets and compare each position to this reference - where we have a substitution we can count it, where the reference is a - and the experimental has a base we can count it as an insertion, where the reference has a base and the experimental has a - we can count it as a deletion. We can also keep track of each substitution individually to produce the tables as in the paper. I did this using a simple Perl script, but it could be done in R (but probably quite a lot slower).
This results in a file named mutation_counts.tsv. We can
load that here and take a look at what it looks like (we’ll just look at
the first row because it’s a massive table and takes a long time to show
the whole thing)
The entry we can see here is from the sample DCKP93B_BM_5DPOG, it is the most abundant sequence in the merged fastq file (rank = 1), this sequence was found in 343 reads in this sample, relative to the reference, it has 0 snps, 0 insertions and 0 deletions. The total (edit) distance is 0 - the sequence is identical to the reference. The aligned sequence is shown (sequence) alongside a map of the substitutions (conversions)
To explore any groups later on, we’ll add some columns to our dataset that give the group assignment
my.samplename.list <- strsplit(dd$sample,'_')
my.tissue <- vector(length = length(my.samplename.list))
my.time <- vector(length = length(my.samplename.list))
my.group <- vector(length = length(my.samplename.list))
for (i in 1:length(my.samplename.list)) {
my.group[i] <- paste(my.samplename.list[[i]][-1], collapse="_")
my.tissue[i] <- my.samplename.list[[i]][2]
my.time[i] <- my.samplename.list[[i]][3]
}
dd$group <- my.group
dd$tissue <- my.tissue
dd$time <- my.time
First we’ll just take a look at the raw data to get a feel for what we are dealing with
First, we’ll look at the distribution of sequence lengths - we’re expecting the sequences we have to be ~ 433 bp long as this is the length of the JH4 intron that was amplified. We’ve removed anything that’s not 433 bp, so this is just to check
ggplot(dd, aes(x=sample, y=length)) +
geom_point(aes(colour=time, shape=tissue)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
All the sequences are 433 bp
We already know from the initial analysis that some of the samples have low read counts, but we can confirm it in the current data by plotting the sum of each ‘count’ per sample
ggplot(dd, aes(x=sample, y=count, fill=tissue)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Similar to before, some samples do have very low numbers of reads - this is likely be be problematic for estimating mutations, as we’re likely to detect fewer mutations in these just by chance
With Mutect2, we saw that samples that had less than 400 reads had pretty unreliable mutation counts - probably the result of PCR amplification errors from very low input samples
In our data, we’ve tracked the number of SNPS, the number of indels and the total number of mutations (we can call that the edit distance). Here we’ll look at To do this, we’ll SNPs rather than edit distance since SNPs are probably a more reliable estimate of ‘true’ mutations
dd.1 <- dd[, list(sum(count),mean(snps), unique(tissue), unique(time)), by=list(sample)]
colnames(dd.1) <- c('sample', 'sum.count', 'mean.snps', 'tissue', 'time')
ggplot(dd.1, aes(x=mean.snps, y=sum.count)) +
geom_point(aes(colour=time, shape=tissue)) +
geom_hline(aes(yintercept=400), color="blue", linetype="dashed")
Similar to our Mutect2 data, samples that have less than 400 reads may be unreliable in the number of mutations that are called
Let’s take a look at what the affected samples are - here we’ll be losing entire samples
too.few.reads <- dd.1$sample[which(dd.1$sum.count < 400)]
print(too.few.reads)
## [1] "DCKP92E_BM_5MPOG" "DCKP92f_BM_5MPOG" "DCKP92h_BM_5MPOG"
## [4] "DCKP92e_SP_5MPOG" "DCKP92g_SP_5MPOG" "DCKP92h_SP_5MPOG"
## [7] "DCKP103D_SP_9MPOG" "DCKP106E_SP_9MPOG" "DCKP106F_SP_9MPOG"
## [10] "DCKP106E_BM_9MPOG" "DCKP151C_BM_9MPOG"
Reluctantly, I think it’s also worthwhile ditching the samples that have less than 400 reads. I don’t think that, on balance, we can have much faith in the data from these
dd.filtered <- dd[!c(dd$sample %in% too.few.reads),] ## remove samples with less than 400 reads
We might expect ‘true’ mutations to be called in multiple reads in a sample. If a set is represented by only 1 read, we might suspect that this would be sequencing error or some other source of noise (of course, they could also be super low frequency mutations, but in that way inseperable from the noise)
We can take a look at if the number of reads per set interacts at all with the number of called mutations in that set. As we are interested in the lower counts, we’ll set the x axis limits of the plot so we can zoom into this area
ggplot(dd.filtered, aes(x=count, y=total)) +
geom_point(aes(colour=time, shape=tissue)) +
xlim(c(0,20))
We can see here that, yes, sets that are represented by only 1 read have a wide range of edit distances, including upwards of 35 mutations which is rarely seen at higher coverage
It’s probably necessary to settle on some sort of filter here. Very liberally, we can take a depth of 2x to say it’s a true set
dd.filtered <- dd.filtered[which(dd.filtered$count > 1),]
After we’ve filtered our sequences down to what we hope are robust sets and samples - what has that done to our dataset?
# we can look at the number of rows in the table to see how many sets we're left with:
dim(dd.filtered)[1]
## [1] 4977
# Or look at that as a percent of the total number of set we started with:
dim(dd.filtered)[1] / dim(dd.orig)[1]*100
## [1] 10.48606
It’s perhaps not fantastic news - we’ve lost around 90 % of our original sets - we’re left with 4977 sets, which is slightly less than usig the previous filters (5269 sets), explained because here we’ve not got an indels
For interests sake, we can look at how many sets each sample now contains after filtering:
dd.filtered.1 <- dd.filtered[, list(length(count), unique(tissue), unique(time)), by=list(sample)]
colnames(dd.filtered.1) <- c('sample', 'num.sets', 'tissue', 'time')
ggplot(dd.filtered.1, aes(x=sample, y=num.sets, fill=tissue)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
So, it doesn’t look very promising here, I suspect that we’ll see a
strong relationship between the number of sets per sample and the total
number of reads
dd.filtered.1 <- dd.filtered[, list(length(count), sum(count), unique(tissue), unique(time)), by=list(sample)]
colnames(dd.filtered.1) <- c('sample', 'num.sets', 'num.reads', 'tissue', 'time')
ggplot(dd.filtered.1, aes(x=num.sets, y=num.reads)) +
geom_point(aes(colour=time, shape=tissue))
The detections of heterogeneity in each sample is heavily dependent on the number of reads available
Bearing in mind the shortcomings with the data, we can still try to see if we can see any patterns in it that make biological sense. I imagine that we’re expecting to see mutations accrue over time
We’ll look at SNPs rather than edit distance. I think that we’ve probably filtered out most of the indels now anyway
dd.filtered.groups <- dd.filtered[,list(mean(snps), unique(tissue), unique(time)), by=list(group)]
colnames(dd.filtered.groups) <- c('group', 'mean.snps', 'tissue', 'time')
dd.filtered.groups$group <- factor(dd.filtered.groups$group, levels = c('BM_5DPOG', 'BM_2MPOG', 'BM_4MPOG', 'BM_5MPOG', 'BM_9MPOG', 'SP_5DPOG', 'SP_2MPOG', 'SP_4MPOG', 'SP_5MPOG', 'SP_9MPOG', 'SP_NEGCTRL', 'SP_POSCTRL'))
ggplot(dd.filtered.groups, aes(x=group, y=mean.snps, fill=tissue)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
The paper at
https://www.jimmunol.org/content/jimmunol/early/2019/08/09/jimmunol.1900483.full.pdf
had some pie charts showing samples and the proportion of reads with 1,
2, 3, 4, 5, 6 and 7+ mutations. Pie charts are pretty poor ways to
represent data (it’s quite difficult to estimate area accurately in a
pie chart, it’s probably better shown in a proportional stacked bar
graph)
First we’ll initialise a data.frame to store results, and then we’ll count how many SNPS in each of the groups in our categories
my.results <- data.frame('group' = unique(dd.filtered$group), 'zero' = 0, 'one' = 0, 'two' = 0, 'three' = 0, 'four' = 0, 'five' = 0, 'six' = 0, 'seven.plus' = 0)
for (i in 1:length(my.results$group)) {
my.group <- dd.filtered$snps[which(dd.filtered$group == my.results$group[i])]
my.results$zero[i] = length(which(my.group == 0))
my.results$one[i] = length(which(my.group == 1))
my.results$two[i] = length(which(my.group == 2))
my.results$three[i] = length(which(my.group == 3))
my.results$four[i] = length(which(my.group == 4))
my.results$five[i] = length(which(my.group == 5))
my.results$six[i] = length(which(my.group == 6))
my.results$seven.plus[i] = length(which(my.group >= 7))
}
Next we’ll set up another function to plot our pie chart, as we don’t want to type it out over and over again and use it to plot our pies
plot.pie <- function(x) {
my.melt <- reshape2::melt(x)
ggplot(my.melt, aes(x="", y=value, fill=variable))+
geom_bar(width = 1, stat = "identity")+
coord_polar("y", start=0) +
theme(axis.text.x=element_blank()) +
ggtitle(x$group)
}
for (i in 1:dim(my.results)[1]) {
f <- plot.pie(rev(my.results[i,]))
print(f)
}
my.results <- data.frame('group' = unique(dd.filtered$sample), 'zero' = 0, 'one' = 0, 'two' = 0, 'three' = 0, 'four' = 0, 'five' = 0, 'six' = 0, 'seven.plus' = 0)
for (i in 1:length(my.results$group)) {
my.group <- dd.filtered$snps[which(dd.filtered$sample == my.results$group[i])]
my.results$zero[i] = length(which(my.group == 0))
my.results$one[i] = length(which(my.group == 1))
my.results$two[i] = length(which(my.group == 2))
my.results$three[i] = length(which(my.group == 3))
my.results$four[i] = length(which(my.group == 4))
my.results$five[i] = length(which(my.group == 5))
my.results$six[i] = length(which(my.group == 6))
my.results$seven.plus[i] = length(which(my.group >= 7))
f <- plot.pie(rev(my.results[i,]))
print(f)
}
We want to set up a function that will allow us to parse out the substitution tables from the data. We’ll make this as a function so we can run it easily multiple times (if we need to). We can then use that function to produce our tables of substitutions for each sample. Here, we’re using the filtered data, but the function makes it easy to run if we wanted to do it again on pre-filtered data
get.conversions <- function(x) {
conversions <- strsplit(x$conversions,';')
conv.tables <- list()
for (i in 1:length(unique(x$sample))) {
result.table <- matrix(nrow=4, ncol=4, 0)
colnames(result.table) <- c('A', 'T', 'C', 'G')
rownames(result.table) <- c('A', 'T', 'C', 'G')
sample.conversions <- conversions[which(x$sample == unique(x$sample)[i])]
for (j in 1:length(sample.conversions)) {
for (k in 1:length(sample.conversions[[j]])) {
conv.val <- unlist(strsplit(sample.conversions[[j]][k],'='))
conv <- unlist(strsplit(conv.val[1],':'))
result.table[conv[1], conv[2]] <- result.table[conv[1], conv[2]] + as.numeric(conv.val[2])
}
}
conv.tables[[i]] <- result.table
}
names(conv.tables) <- unique(x$sample)
return(conv.tables)
}
conversion.tables <- get.conversions(dd.filtered)
print(conversion.tables)
## $DCKP93B_BM_5DPOG
## A T C G
## A 0 0 1 55
## T 3 0 62 7
## C 7 15 0 3
## G 13 3 7 0
##
## $DCKP103B_BM_5DPOG
## A T C G
## A 0 4 3 221
## T 3 0 222 2
## C 3 13 0 0
## G 5 1 0 0
##
## $DCKP141F_BM_5DPOG
## A T C G
## A 0 19 33 56
## T 15 0 78 29
## C 1 48 0 2
## G 18 2 0 0
##
## $DCKP141G_BM_5DPOG
## A T C G
## A 0 4 7 37
## T 38 0 51 32
## C 35 3 0 9
## G 17 29 5 0
##
## $DCKP106C_BM_5DPOG
## A T C G
## A 0 25 4 33
## T 70 0 26 3
## C 33 55 0 1
## G 52 0 17 0
##
## $DCKP141A_BM_5DPOG
## A T C G
## A 0 4 6 47
## T 9 0 48 15
## C 6 30 0 12
## G 15 10 10 0
##
## $DCKP141F_SP_5DPOG
## A T C G
## A 0 6 5 18
## T 36 0 97 4
## C 8 13 0 14
## G 5 5 0 0
##
## $DCKP141G_SP_5DPOG
## A T C G
## A 0 5 1 37
## T 4 0 30 0
## C 5 7 0 1
## G 5 4 1 0
##
## $DCKP106C_SP_5DPOG
## A T C G
## A 0 5 0 36
## T 0 0 53 1
## C 4 0 0 0
## G 22 0 0 0
##
## $DCKP141A_SP_5DPOG
## A T C G
## A 0 15 0 24
## T 26 0 40 20
## C 7 36 0 12
## G 9 2 20 0
##
## $DCKP93C_BM_2MPOG
## A T C G
## A 0 18 28 94
## T 17 0 115 17
## C 11 38 0 13
## G 41 10 8 0
##
## $DCKP93D_BM_2MPOG
## A T C G
## A 0 3 3 91
## T 13 0 86 12
## C 30 63 0 9
## G 20 8 0 0
##
## $DCKP141E_BM_2MPOG
## A T C G
## A 0 3 2 89
## T 1 0 100 2
## C 7 26 0 0
## G 12 8 0 0
##
## $DCKP141D_BM_2MPOG
## A T C G
## A 0 4 2 85
## T 25 0 92 21
## C 5 20 0 2
## G 10 4 0 0
##
## $DCKP141B_BM_2MPOG
## A T C G
## A 0 3 2 89
## T 3 0 74 1
## C 9 21 0 0
## G 7 6 0 0
##
## $DCKP141E_SP_2MPOG
## A T C G
## A 0 3 5 89
## T 1 0 83 5
## C 13 7 0 5
## G 16 3 0 0
##
## $DCKP141D_SP_2MPOG
## A T C G
## A 0 6 5 68
## T 3 0 94 8
## C 7 35 0 0
## G 17 3 0 0
##
## $DCKP141C_SP_2MPOG
## A T C G
## A 0 7 0 66
## T 0 0 59 1
## C 5 18 0 1
## G 8 6 116 0
##
## $DCKP141B_SP_2MPOG
## A T C G
## A 0 2 2 79
## T 176 0 66 2
## C 4 187 0 0
## G 176 3 0 0
##
## $DCKP151B_BM_4MPOG
## A T C G
## A 0 1 2 69
## T 4 0 69 2
## C 7 12 0 1
## G 7 5 1 0
##
## $DCKP141I_BM_4MPOG
## A T C G
## A 0 2 4 90
## T 6 0 85 8
## C 11 18 0 5
## G 14 4 1 0
##
## $DCKP141H_BM_4MPOG
## A T C G
## A 0 0 2 58
## T 109 0 54 0
## C 6 6 0 0
## G 4 3 0 0
##
## $DCKP151B_SP_4MPOG
## A T C G
## A 0 2 3 76
## T 1 0 60 1
## C 7 7 0 0
## G 4 4 0 0
##
## $DCKP141I_SP_4MPOG
## A T C G
## A 0 0 2 37
## T 2 0 44 1
## C 2 8 0 0
## G 4 4 0 0
##
## $DCKP141H_SP_4MPOG
## A T C G
## A 0 0 1 52
## T 30 0 87 58
## C 7 9 0 0
## G 7 9 0 0
##
## $DCKP92g_BM_5MPOG
## A T C G
## A 0 2 1 53
## T 5 0 55 3
## C 10 10 0 0
## G 6 4 0 0
##
## $DCKP103A_SP_9MPOG
## A T C G
## A 0 0 0 7
## T 1 0 13 0
## C 1 11 0 0
## G 0 0 0 0
##
## $DCKP103C_SP_9MPOG
## A T C G
## A 0 1 1 45
## T 3 0 46 1
## C 0 3 0 0
## G 3 3 0 0
##
## $DCKP151C_SP_9MPOG
## A T C G
## A 0 0 2 6
## T 0 0 15 0
## C 1 0 0 0
## G 0 1 0 0
##
## $DCKP106F_BM_9MPOG
## A T C G
## A 0 86 86 87
## T 4 0 64 2
## C 7 8 0 0
## G 3 6 86 0
##
## $DCKP251A_SP_POSCTRL
## A T C G
## A 0 0 1 62
## T 1 0 57 0
## C 6 5 0 1
## G 7 4 0 0
##
## $DCKP251B_SP_POSCTRL
## A T C G
## A 0 1 0 43
## T 131 0 161 1
## C 5 96 0 0
## G 0 1 0 0
##
## $DCKP251C_SP_POSCTRL
## A T C G
## A 0 0 0 7
## T 0 0 10 0
## C 0 0 0 0
## G 0 1 0 0
##
## $DCKP212E_SP_NEGCTRL
## A T C G
## A 0 1 0 45
## T 0 0 58 3
## C 4 6 0 0
## G 5 2 0 0
##
## $DCKP223H_SP_NEGCTRL
## A T C G
## A 0 0 0 102
## T 1 0 26 0
## C 4 3 0 0
## G 0 1 0 0
##
## $DCKP223I_SP_NEGCTRL
## A T C G
## A 0 0 1 56
## T 0 0 64 2
## C 6 7 0 0
## G 1 1 0 0
When aggregating the counts per group, I will divide the total number of each substitution by the number of individuals per group, to give an average per group, as the groups may not have the same number of individuals
get.group.conversions <- function(x) {
conv.tables <- list()
for (i in 1:length(unique(x$group))) {
xx <- x[which(x$group == unique(x$group)[i]),]
conversions <- strsplit(xx$conversions,';')
result.table <- matrix(nrow=4, ncol=4, 0)
colnames(result.table) <- c('A', 'T', 'C', 'G')
rownames(result.table) <- c('A', 'T', 'C', 'G')
for (j in 1:length(conversions)) {
for (k in 1:length(conversions[[j]])) {
conv.val <- unlist(strsplit(conversions[[j]][k],'='))
conv <- unlist(strsplit(conv.val[1],':'))
result.table[conv[1], conv[2]] <- result.table[conv[1], conv[2]] + as.numeric(conv.val[2])
}
}
result.table <- result.table / length(unique(xx$sample))
conv.tables[[i]] <- result.table
}
names(conv.tables) <- unique(x$group)
return(conv.tables)
}
conversion.tables <- get.group.conversions(dd.filtered)
print(conversion.tables)
## $BM_5DPOG
## A T C G
## A 0.00000 9.333333 9.00000 74.83333
## T 23.00000 0.000000 81.16667 14.66667
## C 14.16667 27.333333 0.00000 4.50000
## G 20.00000 7.500000 6.50000 0.00000
##
## $SP_5DPOG
## A T C G
## A 0.00 7.75 1.50 28.75
## T 16.50 0.00 55.00 6.25
## C 6.00 14.00 0.00 6.75
## G 10.25 2.75 5.25 0.00
##
## $BM_2MPOG
## A T C G
## A 0.0 6.2 7.4 89.6
## T 11.8 0.0 93.4 10.6
## C 12.4 33.6 0.0 4.8
## G 18.0 7.2 1.6 0.0
##
## $SP_2MPOG
## A T C G
## A 0.00 4.50 3.0 75.5
## T 45.00 0.00 75.5 4.0
## C 7.25 61.75 0.0 1.5
## G 54.25 3.75 29.0 0.0
##
## $BM_4MPOG
## A T C G
## A 0.000000 1 2.6666667 72.333333
## T 39.666667 0 69.3333333 3.333333
## C 8.000000 12 0.0000000 2.000000
## G 8.333333 4 0.6666667 0.000000
##
## $SP_4MPOG
## A T C G
## A 0.000000 0.6666667 2.00000 55
## T 11.000000 0.0000000 63.66667 20
## C 5.333333 8.0000000 0.00000 0
## G 5.000000 5.6666667 0.00000 0
##
## $BM_5MPOG
## A T C G
## A 0 2 1 53
## T 5 0 55 3
## C 10 10 0 0
## G 6 4 0 0
##
## $SP_9MPOG
## A T C G
## A 0.0000000 0.3333333 1.00000 19.3333333
## T 1.3333333 0.0000000 24.66667 0.3333333
## C 0.6666667 4.6666667 0.00000 0.0000000
## G 1.0000000 1.3333333 0.00000 0.0000000
##
## $BM_9MPOG
## A T C G
## A 0 86 86 87
## T 4 0 64 2
## C 7 8 0 0
## G 3 6 86 0
##
## $SP_POSCTRL
## A T C G
## A 0.000000 0.3333333 0.3333333 37.3333333
## T 44.000000 0.0000000 76.0000000 0.3333333
## C 3.666667 33.6666667 0.0000000 0.3333333
## G 2.333333 2.0000000 0.0000000 0.0000000
##
## $SP_NEGCTRL
## A T C G
## A 0.0000000 0.3333333 0.3333333 67.666667
## T 0.3333333 0.0000000 49.3333333 1.666667
## C 4.6666667 5.3333333 0.0000000 0.000000
## G 2.0000000 1.3333333 0.0000000 0.000000